Nowcasting with Google Trends: a keyword selection method

نویسنده

  • Andrew Ross
چکیده

Search engines, such as Google, keep a log of searches entered into their websites. Google makes this data publicly available with Google Trends in the form of aggregate weekly search term volume. Aggregate search volume has been shown to be able to nowcast (i.e. compute real-time assessment of current activity) a variety of variables such as influenza outbreaks, financial market fluctuations, unemployment and retail sales. Although identifying appropriate keywords in Google Trends is an essential element of using search data, the recurring difficulty identified in the literature is the lack of a technique to do so. Given this, the main goal of this paper is to put forward a method (the “backward induction method”) of identifying and extracting keywords from Google Trends relevant to economic variables. Introduction The growing use of the internet has made available a number of new data sources. For example, the increasing use of the internet as an information finding tool has led to the creation of new data sources to measure consumer sentiment and behaviour. Search engine providers, such as Google, keep a record of searches entered in their websites (McLaren & Shanbhogue, 2011). Google has made some of this data available by publicising aggregate search volumes for specific search terms. Data on search term volume can be used to analyse a variety of issues and variables. For example, the query volume for „dishwashers‟, „fridges‟, or „flat screen televisions‟ can be used to explain demand for durable goods. The major catalyst in this area of research, however, was the research conducted by Ginsberg et al. (2009), who used query volume of influenza and flu related search terms (e.g. flu symptoms) to monitor flu outbreaks in real time. Given the availability of aggregate internet search data, there is now the possibility to add a further method of economic analysis, which attempts to explain current, rather than future activity. Internet search data is therefore mainly used to provide real time assessment of current activity i.e. to nowcast rather than forecast (Aruoba & Diebold, 2010). Policy making relies upon the availability of accurate and timely micro, sub-macro, and macro-level data. Yet, the majority of official data is published with a reporting lag of several weeks, and may subsequently even be revised (Choi & Varian, 2009a). Even though there are numerous methods and econometric models employed to provide for timely economic analysis, the lag in official data may delay and distort rational policy making. Real-time policy making is particularly significant in times of structural change or economic uncertainty where the predictive power of models can break down (Castle et al., 2009). At such times, it is necessary to obtain timely high-frequency data which remains robust during structural changes. Thus, the main advantage of using internet search data is that it is made available without lags (maximum of one week) whilst covering a representative sample of the population. Internet search data is therefore mainly used to provide real time assessment of current activity. Using internet search terms to nowcast economic variables requires the selection of explanatory keywords. Although economic intuition can be used to identify keywords explaining sales of flatscreen TVs, for example, when it comes to nowcasting complex economic variables such intuition, however, may not be sufficient. Thus, the main difficulty when using search term data is the selection of individual search terms in Google Trends (GT) ii that are significant in explaining the economic variable investigated.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Nowcasting with Numerous Candidate Predictors

The goal of nowcasting, or “predicting the present,” is to estimate up-to-date values for a time series whose actual observations are available only with a delay. Methods for this task leverage observations of correlated time series to estimate values of the target series. This paper introduces a nowcasting technique called FDR (false discovery reduction) that combines tractable variable select...

متن کامل

Macroeconomic Nowcasting Using Google Probabilities∗

Many recent papers have investigated whether data from internet search engines such as Google can help improve nowcasts or short-term forecasts of macroeconomic variables. These papers construct variables based on Google searches and use them as explanatory variables in regression models. We add to this literature by nowcasting using dynamic model selection (DMS) methods which allow for model s...

متن کامل

Forecasting the Incidence of Dementia and Dementia-Related Outpatient Visits With Google Trends: Evidence From Taiwan

BACKGROUND Google Trends has demonstrated the capability to both monitor and predict epidemic outbreaks. The connection between Internet searches for dementia information and dementia incidence and dementia-related outpatient visits remains unknown. OBJECTIVE This study aimed to determine whether Google Trends could provide insight into trends in dementia incidence and related outpatient visi...

متن کامل

Crowd-Squared: A New Method for Improving Predictions by Crowd-sourcing Google Trends Keyword Selection

Advances in information technologies and analytic tools have dramatically increased our ability to obtain accurate data on billions of economic decisions almost the instant that they are made. Services such as Google Trends aggregate billions of search queries and provide information about the search volume of different terms. This information from the “crowd”, had been successfully used to acc...

متن کامل

Adaptive nowcasting of influenza outbreaks using Google searches

Seasonal influenza outbreaks and pandemics of new strains of the influenza virus affect humans around the globe. However, traditional systems for measuring the spread of flu infections deliver results with one or two weeks delay. Recent research suggests that data on queries made to the search engine Google can be used to address this problem, providing real-time estimates of levels of influenz...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014